Converting HTML to EXCEL (XLS)
06-23-2006, 11:29 PM
|
Converting HTML to EXCEL (XLS)
|
Posts: 626
|
Can anyone help me convert my HTML page into xls format (excel)?
Excel opens my HTML page great, but I need to actually convert the file to XLS format.
Any help is appreciated... If no one knows how to do this, can you point me to scripts and/or tutorials?
|
|
|
|
06-24-2006, 05:19 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
|
File -> Save As -> select XLS as File Type ??
apart from that, Why??
__________________
Chris. ->> Please login or register to view this content. Registration is FREE <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
|
|
|
|
06-24-2006, 11:07 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
I have tried that. However, the file is actually an HTML file and EXCEL is loading the file file. BUT, I need to extract the information from the file so that I can save it in my database.
The problem is that when I go to use my script to actually extract the info, it can't read the file because it isn't a true XLS file. So I have to save the file, open it in excel, click save as XLS, then run my script to extract the information.
I need to automate this process... Here is what I want to do:
1. Download the .asp file and save on server (which contains the table for the XLS file) - I can do this no problem
2. run the script to convert to proper XLS format.
3. run the script to extrace raw data from XLS file
4. save data to database
In order to do this, I need to be able to convert the HTML data from the .asp file to XLS format.
Do you know if it is possible?
|
|
|
|
06-25-2006, 04:46 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
|
because this is the PHP forum and the script is in ASP, should we assume that the asp page is remote ?
if so, this
Quote:
|
I need to be able to convert the HTML data from the .asp file to XLS format
|
isn't likely to be possible server-side.
I can't see an easy way to accomplish this. Although I'm working blind with this idea because I've no idea what any of this data/scripts look like or do.
But at a guess
You will need to scrape the .asp page to get the raw source, strip out the HTML code and any data you do not need, format the bits you want as a CSV file/string then open or import that into Excel.
looking at your list of steps though, you need to rethink what exactly you are doing
Why do you need steps 2 & 3 ? Is there a specific reason you use Excel ?
A order of
1. Download the .asp file and save on server (which contains the table for the XLS file) - I can do this no problem
2. run a PHP script to extract raw data from file
3. save data to database
would be easier
__________________
Chris. ->> Please login or register to view this content. Registration is FREE <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
|
|
|
|
06-25-2006, 02:08 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
I would absolutely agree that it would be easier, however, I asked in another thread how to parse the HTML information without any luck. Basically, I don't know how to extract the HTML coding, and get the information out. So I figured there might be a script somewhere (or tutorial) which would be able to convert an html table into an excel file.
I'm sorry to make you work blind but as the table isn't mine, it is from my head office, I am allowed to view it and see it but I don't know if I am able to release it to the public.
It is a big table/spreadsheet of data, however I only need data from about a 10 cell block (5x2). It contains interest rates, and basically I just want to *scrape* out the interest rates and save them in my database. If I can get the file into CSV format even, it would be much easier.
Oh... Yes it is remote, therefore the .asp page is really and html page.
|
|
|
|
06-25-2006, 03:46 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
|
without seeing the structure of the page output it simply isn't possible to say how it could be done exactly.
some pointers to conversion methods. You simply have to consider the HTML as a delimited file and write an algorithm to extract the information from the page.
so;
strip out all html tags except table row and cell related ones
strip out all \n characters
replace all </tr> with \n
replace all <tr ...> with nothing
replace all </td> with "," (comma)
replace all <tr ...> with nothing
get the idea? you simply reduce the HTML source code to a series of lines each with the columns seperated by commas.
You can then parse this into an array and extract the elements that you need.
__________________
Chris. ->> Please login or register to view this content. Registration is FREE <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
|
|
|
|
06-25-2006, 10:51 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 84
|
If you mean parsing the content of the html document, then you would need to create a parser script for this html files, you can save the extracted data in tab delimited texts... this you can open easily in Excel.
|
|
|
|
06-26-2006, 12:34 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
Thanks guys... Is there any way you can give me a sample html extraction algorithm? Just imaging 2 columns and 6 rows with numbers in every row. I don't really know how to search out all <table> and <tr> and <td> tags.
|
|
|
|
06-26-2006, 12:46 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 84
|
What you should do is look for unique patterns, coz its hard to extract if you will just look for <tables, <td inside these tags there will be unique patterns... you check everyline for this patterns if you fins it, parse the line and extract the data..., if a record is completed insert to a tab delimeted file, then continue extracting for records...
|
|
|
|
06-26-2006, 03:17 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
|
Can't be more specific without seeing the data.
but I would assume that the source is consistent each time the page is pulled and only the data changes. So you could use simple str_replace lines to remove much of the code (meta tags, open and close <body>,<html> etc).
If you want to replace everything in the <head> at once, the regular expression pattern would be
PHP Code:
<head>.*?</head>
(think that's right)
If there are script blocks
PHP Code:
<script.*?>.*?</script>
Then you should be down to the steps I outlined earlier in the thread. I was guessing there that each item of data would be in it's own cell and each grouping on the "Y" axis in rows.
__________________
Chris. ->> Please login or register to view this content. Registration is FREE <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
|
|
|
|
06-26-2006, 11:30 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
Ok...
Here is a bit of the html page I need to scrape:
Code:
<html>
<head>
<title>Virtual Office</title>
<META HTTP-EQUIV="Refresh" CONTENT="3600; URL=LogoutExpired.php">
<link rel="STYLESHEET" type="text/css" href="includes/style/style.css">
<script src="includes/javascript.js" type="text/javascript"></script>
<script language=JavaScript>
function printWindow() {
alert("Before printing, ensure that page settings are for legal size and landscape orientation.");
bV = parseInt(navigator.appVersion);
if (bV >= 4) {
window.print();
} else {
window.print();
}
}
</script>
</head>
<body>
Here is the code including the regular expression I am testing it with:
Code:
<?php
$line = "";
$fr1 = fopen("c:\file.htm", "r") or die("Couldn't open file");
while(!feof($fr1)) {
$line .= fgets($fr1, 2048);
}
fclose($fr1);
$line = eregi_replace('<script.*>.*<\/script>?', '<s></s>', $line);
echo "<textarea>$line</textarea>";
?>
For some reason it isn't working... Basically, I am attempting to re-write <script></script> tags (and all contents) to <s></s>. There really isn't a reason for this other than I am trying to understand regular expressions. I tried putting the ? after .* but I was getting an error message.
|
|
|
|
06-27-2006, 04:40 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
|
the code above works ok for me.
are you testing this on a windows machine ?
if so your path to the file needs to be either using a forward slash ( c:/file.htm ) or you should "escape" the backslash ( c:\\file.htm )
otherwise what errors are you getting?
does PHP have access rights to the root of C: drive ?
__________________
Chris. ->> Please login or register to view this content. Registration is FREE <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
|
|
|
|
06-27-2006, 01:02 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
Sorry, I guess I should clarify what is happening... First, yes, I am currently testing it on a windows machine running apache 2.0.54.
I should say that the script is running without any errors and it is stripping out the html tags and all the content within those tags. However, the regular expression isn't working as I am trying.
You will notice that within <head></head> there are 2 <script></script> tags with content. I am expecting the regular expression to EXTRACT the <script> tags and their content and replace them with <s></s> tags.
The problem is that when I run the script it strips out BOTH sets of <script> tags and their content but only replaces it with ONE set of <s></s> tags when it should replace it with 2 sets of <s></s> tags.
Therefore, I think there is a problem with my reg exp.
Any suggestions?
|
|
|
|
06-27-2006, 01:47 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 2,918
Name: Keith Marshall
Location: Connecticut
|
One thing about regular expressions - They become greedy if you let them.
http://www.regular-expressions.info/
__________________
<mgraphic /> - I don't have a solution but I admire the problem.
|
|
|
|
06-30-2006, 03:04 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 22
|
You can try this.
Use the mouse to select the content from the page you want to copy. I assume that it is in some sort of table. Copy using Ctrl+c.
Go to Excel and select Ctrl+v to paste what you've got on the HTML page into an excel spreadsheet. It will look like crap.
Don't click anything or go anywhere - just select Ctrl+c right after you did the Ctrl+v above, then move to a different tab in the Excel workbook.
Select Edit | Paste Special, and choose to paste values.
You might have to adjust the first row, but then should be good with just the content of what was on the HTML page.
Last edited by steve49589; 06-30-2006 at 03:09 PM..
|
|
|
|
06-30-2006, 04:46 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 1
Name: Franklin
Location: NJ
|
Sorry guy not my area, but if I Talk to my buddy he might know so hold on.
Quote:
|
Originally Posted by zincoxide
Can anyone help me convert my HTML page into xls format (excel)?
Excel opens my HTML page great, but I need to actually convert the file to XLS format.
Any help is appreciated... If no one knows how to do this, can you point me to scripts and/or tutorials?
|
|
|
|
|
06-30-2006, 05:14 PM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 626
|
Thanks for all your help guys... I took the suggestion of using a regex and stripped out all the HTML and just made it into a CSV file.
I have now created the script which automatically updates my rates in my database.
Thanks for all the help... It was great!
|
|
|
|
12-09-2010, 07:09 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 1
Name: Md. Tariqul Islam Drubo
|
Just add the following magical lines at the very beginning of your php file that you want to export to excel
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=" . $_GET['f'] . ".xls");
header("Pragma: no-cache");
header("Expires: 0");
|
|
|
|
12-09-2010, 07:23 AM
|
Re: Converting HTML to EXCEL (XLS)
|
Posts: 156
|
Look for sites like mediaconverter they will be able to do just what you need
|
|
|
|
|
« Reply to Converting HTML to EXCEL (XLS)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|