timbo All American 1003 Posts user info edit post |
http://hackaday.com/2012/06/14/penny-auction-hacking-put-on-your-statisticians-hat/
If you look at my code, please be nice. 6/19/2012 10:39:31 AM |
BigMan157 no u 103354 Posts user info edit post |
That's pretty neat. I know a guy trying to start his own version of one of those sites.
Why'd you use Selenium? 6/19/2012 12:11:20 PM |
timbo All American 1003 Posts user info edit post |
I tried using BeautifulSoup and urllib but when it parsed the website, the containers with bidding information were empty (due to the AJAX script running). The only scraping module that would actually recover the values I wanted was Selenium.
The script basically functions by opening an auction in a window, recovering bidding info, then refreshing every 10 seconds. After the auction ends, the data is organized and dumped into neat files and a final summary file that contains all the auction goodies you would want to know (Number of bidders, number of bids per user, auction length, etc). 6/19/2012 12:20:43 PM |
qntmfred retired 40809 Posts user info edit post |
kick-ass dude, nice job.
you probably mentioned it in one of your blog posts, but why aren't you just scraping directly with http requests? why bother with selenium? also, you could much more easily analyze the data if you use a real database and not just csv/excel
[Edited on June 19, 2012 at 12:30 PM. Reason : nm just read ^ still, i can't imagine you couldn't scrape it if you know the right ajax] 6/19/2012 12:23:52 PM |
timbo All American 1003 Posts user info edit post |
You're right about scraping the AJAX requests directly. There was a way to do it, but it required individual cookies that the server generated (http://pennystats.blogspot.com/2012/04/very-interesting-find.html). It is possible to do it that way, but the data was messy and I honestly didn't know how to generate valid auction cookies and scrape them directly.
Selenium offered a turn-key solution that just worked, so I just decided to go with it.
[Edited on June 19, 2012 at 12:42 PM. Reason : .] 6/19/2012 12:39:28 PM |
BigMan157 no u 103354 Posts user info edit post |
i too would have approached it with php/curl and DBed the data, but hey, if you got a solution that works for you why not 6/19/2012 12:44:18 PM |
timbo All American 1003 Posts user info edit post |
I am pretty sure the majority of people that scrape the data use php and dump them into a database. (http://www.allpennyauctions.com/).
Another benefit of doing it that way would be that I could use a significantly less powerful server to scrape data. Right now I have a dual core Xeon server (3.3ghz) with 8GB of RAM chugging away and it can only scrape about 2000-2500 auctions per day. I think if I upped the ram to 16 GB I could probably grab them all at once. 6/19/2012 12:50:07 PM |
mildew Drunk yet Orderly 14177 Posts user info edit post |
http://pennystats.blogspot.com/2012/04/first-post-in-what-could-be-quite.html
That pop up next to the scroll bar is annoying as shit 6/19/2012 12:54:29 PM |
timbo All American 1003 Posts user info edit post |
I usually use a scrolly mouse so I never noticed. I can see how that would be annoying.
The worst part is that wordpress doesn't allow you to modify their "Dynamic" theme so there's nothing I can do about it. 6/19/2012 12:57:55 PM |
synapse play so hard 60940 Posts user info edit post |
Very nice work. 6/19/2012 3:25:39 PM |
xienze All American 7341 Posts user info edit post |
You'd have to venture over to Java, but htmlunit would give you a way to run the page's Javascript. 6/19/2012 6:44:43 PM |
Hiro All American 4673 Posts user info edit post |
This thread is epic. Great work timbo 6/19/2012 7:17:51 PM |
Moox All American 612 Posts user info edit post |
How can I use this to make money? 6/20/2012 12:41:39 AM |
timbo All American 1003 Posts user info edit post |
You need to break down the data and look at stuff you want to target. Then look for the best time to try and win.
The charts of the day are useful for doing this. This one in particular. http://pennystats.blogspot.com/2012/06/pennystats-chart-of-day-61112.html 6/20/2012 9:18:32 AM |
Moox All American 612 Posts user info edit post |
So basically I should log in the middle of the night on weekends, buy $50 gift cards, and sell them to Plastic Jungle?
That simple? 6/21/2012 4:00:08 AM |
timbo All American 1003 Posts user info edit post |
That was my theory. But 5000+ people have read my blog since then, so I duno if it is still applicable. You could always use my software to data mine and see if those statistics are still accurate.
[Edited on June 21, 2012 at 1:29 PM. Reason : spelling] 6/21/2012 1:28:47 PM |