How Do I Learn About [n]?

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: How Do I Learn About [n]?

Postby Ben-oni » Sat Sep 01, 2012 5:35 pm UTC

Shivahn wrote:Ok, so from time to time a task comes up at work that is mind-numbing and repetitive, and I am lazy, so as soon as something like that shows up I try to find a way to make the computer do that for me. Well, one of those is coming up, but fixing it requires knowledge in an arena I have not coded in: internet interfaces. I need to basically take a massive file with information and format it (no big deal), then do stuff to it. I guess I'll describe what I do now. I log into a website (which creates a pop-up window that's the one I actually communicate with), use one of the fields on the site to search and see if the entity I'm working on is in the system, if not, click a link which brings me to a registration page and then fill that page in with data about the entity, click another hyperlink and mess with a couple of drop-down menus, click (I THINK it's a hyperlink) in a calendar-type thing, then click a checkbox next to a specific time, then click a save button.

I have a big file from which I can get all the data I need for the forms, but I don't know how to write a program that communicates with a website like this. I'd probably write the thing in Python, but a language-agnostic tutorial would be excellent. Does anyone have any suggestions? I know very basic network theory, but I wouldn't know how to begin either logging in with a console-based program or navigating menus, boxes, search fields, and so on with one.

There are two aspects to what you need to deal with. The first part is HTTP, which is pretty straightforward. You won't have to know the specifics of the protocol, just the nature of the GET and POST messages. You could write code to handle that (it's not hard), but every language already has an API written for doing so.

The second part is navigating the HTML DOM. This could be a bit finicky. If the web pages are poorly written and don't conform to standards, some HTML parsers could fail miserably. Anyways, you'll have to understand the HTML tree so you can figure out whether your entity is present. The code will look something like this:

Code: Select all

for entity in database:
   dom = httpGET("http://domain.name/path/page?entity=" + entity.id)
   if not entityExists(dom):
      httpPOST("http://domain.name/path/save", "field1=" + entity.field1 + "&field2=" + entity.field2)


That's very rough and full of bad practices, but it should get you started. Of course, ideally you'd just use SQL and be done, but that assumes you can access the database backend...

User avatar
Shivahn
Posts: 2199
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: How Do I Learn About [n]?

Postby Shivahn » Sun Sep 02, 2012 4:36 pm UTC

Hmm, I see. I'll need to look into those. I really don't know much about HTTP so there is quite a bit I have to learn.

bittyx
Posts: 194
Joined: Tue Sep 25, 2007 9:10 pm UTC
Location: Belgrade, Serbia

Re: How Do I Learn About [n]?

Postby bittyx » Sun Sep 02, 2012 10:02 pm UTC

@Shivahn:

I'm a PHP dev, and haven't had to do a task like this, but if I were to do it, I'd likely use Python as well (the basics of what I'd do are pretty much the same, just using different libraries/syntax) - a fine opportunity to learn a bit of Python as well :D

First off, you should check whether your pop-up is loaded via AJAX or as a separate page or whatever. Now, assuming the popup has a <form> element (use Firebug in FF or Chrome developer tools or whatever, to find this out), it's probably POSTing your search query to some page (I have no idea how experienced you are with web-stuff - <form action="page.php" method="POST"> means that the form data is being sent via the POST method to page.php). Basically, you need to find out the exact URL that the form is posting to, and send data there from your script - it might also be that this is done via javascript, so you would have to dig into javascript and check where the search query is being sent to.

I'd post my actual search query using cURL (I see that Python has pycurl - I'd either try that, since I'm somewhat familiar with cURL, or play around with urllib2, as demonstrated here). You don't really need to know much about HTTP for this, since the libraries you use will take care of most of that (though if you can spare the time, lightweight projects like this one are the ideal place to learn something new!)

Let's say you find out that this is http://www.site.com/search.php - since this is behind a login, your script should also do that as well. As a quick hack, I'd likely just log in with my browser, and check the cookies - there will be some kind of session id or something there - I'd probably just copy all the cookies there and send them from my script as well, including also the user-agent string my browser sends (since a lot of sites try to match user-agents within a single session, to make session hijacking harder) - another important thing is to check whether (and how often) the site regenerates session IDs (basically, how often does the site send the session cookie with a different value), so I'd know whether my script also has to accept and set cookies - though I can't imagine this would be hard to implement anyway (cookies are basically just an associative array you keep memorized).

Now the form that posts the data is likely something like

Code: Select all

<form action="search.php" method="POST">
<input type="text" name="query" value="" />
<input type="submit" name="submit" value="Submit" />
</form>

That means that the POSTed request will be something like

Code: Select all

query=somethingsomething
(don't forget to urlencode somethingsomething), which is what you need to send to the search script - it's very likely though that your library (pycurl/urllib2/whatever) already deals with this, and all you need to provide is the raw POST array (PHP's cURL library has this so I assume others do as well).

Okay, so now we can fetch the resources we're interested in, and send data - but what do we do with them?

You probably want to use some DOM parser, because that's the easiest (and the correct) way to do stuff like this. A problematic situation would be if their pages have malformed HTML, but luckily, Python has Beautiful Soup which deals incredibly smart with bad HTML code.

At this point, you should manually inspect the HTML structure of all the relevant pages (ie. the link that brings you to the registration page, the fields in the registration page, etc.) and from then on it's just a matter of POSTing the correct data once again to the correct link (all of which you can find out with a quick inspection from your browser). Of course, I assume it's easy enough to retrieve your data from your big file, so that's mostly that...

Helpful hint: when traversing the DOM, you are pretty much only interested in how to get to the element you want in the easiest possible way, while identifying it uniquely. To be more specific, say you have something like this:

Code: Select all

<html>
<head>
<title>Page</title>
</head>
<body>
<div id="container">
  <form action="register.php" method="POST">
    <input type="text" name="item_name" value="Default item name" />
    <input type="text" name="item_qty" value="0" />
    <input type="submit" name="submit" value="Submit" />
  </form>
  <a id="go-to-drop-down-menus" href="some_other_script.php">Linky!</a>
</div>
</body>
</html>

Here, one way to reach the link (assuming you want to, say, find out the href the link is pointing to) is "html > body > div#container > a#go-to-drop-down-menus", which would be traversing the tree from the root - but since you know element IDs are unique within a document (assuming, again, that the HTML isn't awful), you can just find it via its id - "a#go-to-drop-down-menus" - in Beautiful Soup, you'd do something like soup.find(id="go-to-drop-down-menus").get('href') and that's it. Of course, I'm mostly just rambling here about stuff that comes to mind - this is really where you need to figure out the document structure on your own and improvise from there. Also, if you're familiar with CSS selectors (or jQuery selectors which mostly just augment those in CSS), you could maybe use pyquery, for a more familiar syntax.

Mostly, all that this mini-project comes down to is sending appropriate data to appropriate URLs (implementing cookies as well, since you need a login), and doing some light DOM traversal along the way to control some of the flow (ie. whether or not an entity should be registered), and find URLs that your hyperlinks lead to (assuming they even change at all - you should investigate to find this out).

I'm not very familiar with Python (I've played with it a bit for Project Euler purposes, but not much beyond that), but I'd probably rate this at about a few hours of work (including researching all the libraries I need, as well as how to use them). Ideally, if everything works on the first try, I'd do it in, say, an hour (maybe half an hour in PHP, but PHP isn't really well-suited to tasks like this), but since you're dealing with a third-party system you have no control over, unforeseen problems are likely to come up, and that's where most of the time would be lost. Well, ideally, you could ask the website-owners to just grant you direct access to their database, and do everything in a few minutes, but it seems that this is not an option :P

Of course, if you have any other questions, feel free to ask - I've tried to be language-agnostic, but since you've already mentioned Python, I've done some quick google searches to find some suitable libraries for this, so I hope I've helped you at least a bit.

User avatar
thoughtfully
Posts: 2244
Joined: Thu Nov 01, 2007 12:25 am UTC
Location: Minneapolis, MN
Contact:

Re: How Do I Learn About [n]?

Postby thoughtfully » Sun Sep 02, 2012 11:36 pm UTC

Another relevant Python library is mechanize, which is based on a PERL module of the same name. I'm sure cURL is available for PERL as well, or any number of languages. I haven't done much work of this sort, but you might find one of them is a better fit for you than the other, or they might be both useful for distinct subtasks, etc.
Image
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
-- Antoine de Saint-Exupery

User avatar
Shivahn
Posts: 2199
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: How Do I Learn About [n]?

Postby Shivahn » Tue Sep 04, 2012 6:13 pm UTC

Thanks! There looks to be a lot I can look into. I appreciate the suggestions/ideas.

User avatar
styrofoam
Posts: 256
Joined: Sat May 08, 2010 3:28 am UTC

Re: How Do I Learn About [n]?

Postby styrofoam » Tue Apr 23, 2013 4:59 pm UTC

I'd recommend, instead of using Python or Perl, using Node.JS with JSDOM. Since it's just JavaScript and a DOM, you can use all the knowledge, documentation, and muscle memory of writing web apps to write your scraper, or even execute code from the page itself, if it relies on JS.

That's just me, though.
aadams wrote:I am a very nice whatever it is I am.

wolf99
Posts: 27
Joined: Thu Sep 20, 2012 3:47 pm UTC

Re: How Do I Learn About [n]?

Postby wolf99 » Mon Jul 29, 2013 4:40 pm UTC

Any good tutorial for VB.Net, slightly (but not too much) beyond the basics?

Im pretty swish at C with embedded systems, understand the basics of the concepts involved in OOP and have thrown programs together in VB6 some time ago.
So the "introductions" or "hello world" styly tutorials to VB dont normally go far enough. Other stuff I've come across has missed a giant gap of stuff from that level though...
I have tried working my way through MSDN but it seems not organised too well for linear learning, more for referencing.
Cheers

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: How Do I Learn About [n]?

Postby Carnildo » Tue Oct 01, 2013 4:54 am UTC

Any suggestions for "Javascript as a 25th language" resources?

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: How Do I Learn About [n]?

Postby Jplus » Wed Oct 02, 2013 8:50 am UTC

I think the sections "Features" and "Syntax" of https://en.wikipedia.org/wiki/Javascript fit the bill.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
Diadem
Posts: 5652
Joined: Wed Jun 11, 2008 11:03 am UTC
Location: The Netherlands

Re: How Do I Learn About [n]?

Postby Diadem » Wed May 21, 2014 3:43 pm UTC

Does anybody have some good resources on WPF / Xaml with C#?

I have been trying to learn this now for a few weeks, but it has the steepest learning curve in the history of learning curves. Most resources out there seem to assume you already know everything there is to know about C# or .net, both of which I have 0 experience with. And with xaml everything seems to depend on everything else, so I don't know where to begin.
It's one of those irregular verbs, isn't it? I have an independent mind, you are an eccentric, he is round the twist
- Bernard Woolley in Yes, Prime Minister

User avatar
Yakk
Poster with most posts but no title.
Posts: 11053
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: How Do I Learn About [n]?

Postby Yakk » Thu May 22, 2014 6:39 pm UTC

wolf99 wrote:Any good tutorial for VB.Net, slightly (but not too much) beyond the basics?

VB.Net basically is a thin skin on C#.net. The difference between the two languages is extremely small, with VB.net looking more like VB, and C#.net looking more like C++.

So I'd advise looking at both VB.net and C#.net sources. Find some interesting C# code, and learn how to transcribe it to VB.net.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
rath358
Best Dressed 2017
Posts: 944
Joined: Wed Jan 14, 2009 6:02 am UTC
Location: South Camberville

Re: How Do I Learn About [n]?

Postby rath358 » Mon Jun 30, 2014 2:06 pm UTC

So, I need to build a windows app in c++ that has a nice interface and such, instead of just taking arguments as I an used to doing for my coursework. I have been told that .NET is the way to go for this. (Qt appears to be out for legal reasons, and I have to stick with my c++ code because of another library I use)
Advice on where to learn the basics of .NET and how to integrate it with a c++ program?

Edit: upon further reading, it appears that although using c++ with winforms or wpf is "supported", it is not the suggested route to go down, the c++ winforms editor has been deprecated since VS2010, and there are few tutorials or useful repositories of knowledge for me to reference. So it looks like the best way forward is to design the interface in c#, and make a wrapper around the underlying CPP functionality. Any advice on how to get started with that?

FLHerne
Posts: 41
Joined: Fri Jan 13, 2012 9:44 pm UTC

Re: How Do I Learn About [n]?

Postby FLHerne » Mon Jul 07, 2014 8:18 pm UTC

rath358 wrote:Qt appears to be out for legal reasons.

It's under LGPL (among other things), so you can dynamically-link to it (as a DLL on Windows, or .so on Linux) from your program regardless of the license used for the rest of the program. :wink:

If you really need both static linking and a non-GPL license, Digia sell commercial licenses. No idea what the terms and pricing are, because my stuff is GPL. :P

User avatar
rath358
Best Dressed 2017
Posts: 944
Joined: Wed Jan 14, 2009 6:02 am UTC
Location: South Camberville

Re: How Do I Learn About [n]?

Postby rath358 » Mon Jul 14, 2014 2:03 pm UTC

I am developing this for a commercial software product and our legal team is really picky about third party software, so I am afraid that it is still out. Thank you for that bit of knowledge, though.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11053
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: How Do I Learn About [n]?

Postby Yakk » Mon Jul 14, 2014 2:29 pm UTC

Is it a windows "modern" app?

XAML mediated by C++/CLI?

It is a good habit to split your UI from your Engine anyhow. Tangling the two is code smell.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
rath358
Best Dressed 2017
Posts: 944
Joined: Wed Jan 14, 2009 6:02 am UTC
Location: South Camberville

Re: How Do I Learn About [n]?

Postby rath358 » Mon Jul 14, 2014 2:45 pm UTC

The c++ side is written as a win32 console application. I wrote a basic wrapper around it last week, but have been away since.
I am working on learning c# and putting together a simple UI in c# with WPF, but I am at a loss on how to translate my c++ code into the c++/CLI format that seems to be required to put the pieces together.

Edit (7-16-14): Once I found the right resource, I was able to compile the c++ code into a DLL, after only a little hair pulling. I haven't successfully called it from C#, but that is only because I haven't learned enough c# to hack a test together yet.
For future googlers, this tutorial got me on the right path. I had to do a little additional searching and a fair bit of debugging to get my code to compile properly, but it only ended up being a few lines of source code modification for the DLL itself, another dozen or two to test it, and of course setting it all up as a new project in VS 2013 and fiddling with the properties.
Thanks for all of the help! It might not seem like much, but you comments fixed a couple of bits of incomplete/incorrect knowledge, and really set me on the right path.
Last edited by rath358 on Wed Jul 16, 2014 2:44 pm UTC, edited 1 time in total.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11053
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: How Do I Learn About [n]?

Postby Yakk » Mon Jul 14, 2014 5:54 pm UTC

Use C++/CLI as a layer between your C++ code and the C# code, don't turn your code into a bunch of C++/CLI.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

EvanED
Posts: 4327
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: How Do I Learn About [n]?

Postby EvanED » Mon Jul 14, 2014 6:06 pm UTC

Another option would be to write pure C++ and compile to a DLL; then you can make native calls from C# to that DLL. I don't know how ugly this is and you might have to write a bit of glue code, but it may or may not be nicer than C++/CLI stuff.

User avatar
Sizik
Posts: 1163
Joined: Wed Aug 27, 2008 3:48 am UTC

Re: How Do I Learn About [n]?

Postby Sizik » Tue Jul 29, 2014 6:00 pm UTC

Any good resources for learning modern C++, given the fact that I know Java and am familiar with C?
gmalivuk wrote:
King Author wrote:If space (rather, distance) is an illusion, it'd be possible for one meta-me to experience both body's sensory inputs.
Yes. And if wishes were horses, wishing wells would fill up very quickly with drowned horses.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11053
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: How Do I Learn About [n]?

Postby Yakk » Tue Jul 29, 2014 7:16 pm UTC

By "know Java", I assume you don't consider generics complicated (Ie, you have a reasonable level of expertise). Could you implement your own class-instance OO framework in Java or C (a "class" is plain old data that describes the layout of data in instances of that class, and methods to operate on said instances of the class (including creation and destruction). A reasonable framework also handles inheritance, virtual and non-virtual functions, and optional instance-local method overrides).

http://isocpp.org/blog/2014/03/effectiv ... ott-meyers looks good from the index.

If you are a novice at Java and C, that is probably the wrong spot to start.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
vodka.cobra
Posts: 370
Joined: Thu Mar 27, 2008 6:50 pm UTC
Location: Florida
Contact:

Re: How Do I Learn About [n]?

Postby vodka.cobra » Wed Oct 28, 2015 4:11 am UTC

If you feel that it would be appropriate to add "application security" to the list in OP, there's a curated list on Github for that: https://github.com/paragonie/awesome-appsec
If the above comment has anything to do with hacking or cryptography, note that I work for a PHP security company and might know what I'm talking about.

User avatar
Quizatzhaderac
Posts: 1510
Joined: Sun Oct 19, 2008 5:28 pm UTC
Location: Space Florida

Re: How Do I Learn About [n]?

Postby Quizatzhaderac » Thu Jun 30, 2016 6:19 pm UTC

Can anyone recommend any good in depth resources on spring and/or hibernate? I could through together a pet shop app easily enough, but I"m responsible for a few legacy apps, so I'm actually much more interested in how things don't work than how hey do, if that makes any sense.
The thing about recursion problems is that they tend to contain other recursion problems.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 8 guests