Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

ASP.NET Forum


You are currently viewing our ASP.NET Forum as a guest. Please register to participate.
Login



Reply
Old 11-21-2005, 05:12 PM html file parsing?
Average Talker

Posts: 19
Trades: 0
Does anyone know of an object within the Microsoft environment that will allow you to parse an html file? What I'm wanting to do is to write a program that reads an html file and finds all the <a name="..."> tags.
I'd like to be able to get the value of the name text, as well as the string that is between the <a name> and the </a> tags.

Obviously, I can write a program and parse through this myself, but I was just figuring that if there's already something for doing that, like I can do now with xml files, then I'd like to use that instead of writing my own.
HockeyFan is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 11-21-2005, 06:36 PM
kline11's Avatar
SearchBliss Web Tools

Latest Blog Post:
Oracle Embraces the Cloud
Posts: 1,724
Name: John
Location: USA
Trades: 0
Use ASP "Regular Expression Patterns"
Just search Google for info.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
kline11 is offline
Reply With Quote
View Public Profile
 
Old 11-21-2005, 06:50 PM
Average Talker

Posts: 19
Trades: 0
I was thinking of something more in line with using the MSHTML document object, doing a CreateDocumentFromURL to get the document object, and then loop through tags, looking for the ones that are <a name tags.
The thing is, I'm not sure I can restrict it to <a name. I think I can only look for anchors (ie. <a ) and then figure out which ones are name and which are href type tags. Just wondering if anyone else has experience with this.
I've done a little bit in javascript, but never in vb and so I'm anxious to try.
If I can get it in vb, then I'll do it in vbscript as well, just for fun.
HockeyFan is offline
Reply With Quote
View Public Profile
 
Old 11-21-2005, 08:44 PM
kline11's Avatar
SearchBliss Web Tools

Latest Blog Post:
Oracle Embraces the Cloud
Posts: 1,724
Name: John
Location: USA
Trades: 0
I think ASP regular expression patterns (VB Script) is what you need like:
(?<anchor><\s*a\s*(??:\b\w+\b\s*(?:=\s*(?:"[^"]*"|'[^']*'|[^"'<> ]+)\s*)?)*)/?\s*>)(?<linktext>.*)<\s*/a\s*>

to find both anchors <a name= and links <a href=
This is the asp forum right?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
kline11 is offline
Reply With Quote
View Public Profile
 
Old 11-21-2005, 09:39 PM
Republikin's Avatar
Defies a Status

Posts: 3,189
Trades: 3
kline11, I think what he is asking is whether there is an easier or perhaps more efficient way than REGEXP. I don't think there is though.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Republikin is offline
Reply With Quote
View Public Profile
 
Old 11-21-2005, 11:27 PM
Average Talker

Posts: 19
Trades: 0
yeah. and I'm aware that I can use regular expressions, and I'm also aware that this is supposed to be about ASP. I am also aware that I can access various dlls, such as MSHTML from asp. That's why I asked.
HockeyFan is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to html file parsing?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.27340 seconds with 12 queries