User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » using PHP to login to a website and parse html Page [1]  
bous
All American
11215 Posts
user info
edit post

I am working on a project for my company...

I want to be able to use PHP to login to a few different websites and then go to specific pages and parse the HTML. What would be the best way to do this? Once I get the HTML i'm good to go.

They both use https and login using a small FORM on the login page. From there, once the session is set i'd like to browse to already known specific webpages.

PHP on Linux

5/1/2007 3:20:26 PM

BigMan157
no u
103352 Posts
user info
edit post

http://us2.php.net/manual/en/ref.curl.php ?

[Edited on May 1, 2007 at 3:46 PM. Reason : then http://us.php.net/dom to parse through it]

5/1/2007 3:43:50 PM

bous
All American
11215 Posts
user info
edit post

curl! that's what i was trying to remember.

thanks!

5/1/2007 3:49:03 PM

qntmfred
retired
40556 Posts
user info
edit post

i've never used the DOM functions to parse the html. i've always just formed my own regular expressions. i knew there were various html parsing libraries and functions out there, but they just seemed under-developed and kinda boxed you into what you could do. are they that much better now? are they easy to work with, are they flexible enough to correctly handle malformed html?

[Edited on May 1, 2007 at 4:34 PM. Reason : .]

5/1/2007 4:10:55 PM

bous
All American
11215 Posts
user info
edit post

i'm gonna parse using regexp

5/1/2007 8:01:01 PM

scud
All American
10804 Posts
user info
edit post

pagescraping is almost impossible to maintain once you're done...unless this is some sort of one-off tool or the sort I would highly suggest finding another solution to your problem. Perhaps there isn't one, just warning that it can be a real PITA

5/1/2007 8:03:05 PM

bous
All American
11215 Posts
user info
edit post

it will be about 4 websites that don't really change except for certain numbers on the page

5/1/2007 10:33:17 PM

rynop
All American
829 Posts
user info
edit post

I'd use PHP's HTTP request extension.

http://www.php.net/manual/en/ref.http.php

and for what your doing, specifically the HTTPRequest class (http://www.php.net/manual/en/http.HttpRequest.php)

gonna need pear tho.

5/2/2007 4:22:48 PM

qntmfred
retired
40556 Posts
user info
edit post

anybody else use DOM packages to parse html?

6/29/2007 10:37:16 AM

30thAnnZ
Suspended
31803 Posts
user info
edit post

ha

i was pagescraping espn.com and tsn.ca for hockey scores and news updates for a LONG time until the traffic from my server hitting it about a billion times a minute (didn't know enough php/mysql at the time to cache that shit) and they blocked my traffic

much better ways to accomplish this stuff

6/29/2007 1:30:51 PM

qntmfred
retired
40556 Posts
user info
edit post

such as?

6/29/2007 4:50:45 PM

qntmfred
retired
40556 Posts
user info
edit post

^^ did you have better method?

Quote :
"anybody else use PHP DOM packages to parse html?"

7/20/2007 2:07:59 PM

philihp
All American
8349 Posts
user info
edit post

a regular expression.

7/20/2007 2:22:27 PM

qntmfred
retired
40556 Posts
user info
edit post

yeah, i've always used regular expressions, but the page i'm currently scraping has 7-deep nested tables with no IDs or anything and it's a PITA.

javascript style getElementsByTagName et al are so much easier to use

7/20/2007 2:38:30 PM

 Message Boards » Tech Talk » using PHP to login to a website and parse html Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.