Amazon Review Scraping: February 2015

Friday, 27 February 2015

Basics of Online Web Research, Web Mining & Data Extraction Services

The evolution of the World Wide Web and Search engines has brought the abundant and ever growing pile of data and information on our finger tips. It has now become a popular and important resource for doing information research and analysis.

Today, Web research services are becoming more and more complicated. It involves various factors such as business intelligence and web interaction to deliver desired results.

Web Researchers can retrieve web data using search engines (keyword queries) or browsing specific web resources. However, these methods are not effective. Keyword search gives a large chunk of irrelevant data. Since each webpage contains several outbound links it is difficult to extract data by browsing too.

Web mining is classified into web content mining, web usage mining and web structure mining. Content mining focuses on the search and retrieval of information from web. Usage mining extract and analyzes user behavior. Structure mining deals with the structure of hyperlinks.

Web mining services can be divided into three subtasks:

Information Retrieval (IR): The purpose of this subtask is to automatically find all relevant information and filter out irrelevant ones. It uses various Search engines such as Google, Yahoo, MSN, etc and other resources to find the required information.

Generalization: The goal of this subtask is to explore users' interest using data extraction methods such as clustering and association rules. Since web data are dynamic and inaccurate, it is difficult to apply traditional data mining techniques directly on the raw data.

Data Validation (DV): It tries to uncover knowledge from the data provided by former tasks. Researcher can test various models, simulate them and finally validate given web information for consistency.

Should you have any queries regarding Web research or Data mining applications, please feel free to contact us. We would be pleased to answer each of your queries in detail.

Source: http://ezinearticles.com/?Basics-of-Online-Web-Research,-Web-Mining-and-Data-Extraction-Services&id=4511101

Thursday, 26 February 2015

What is Data Mining? Why Data Mining is Important?

Searching, Collecting, Filtering and Analyzing of data define as data mining. The large amount of information can be retrieved from wide range of form such as different data relationships, patterns or any significant statistical co-relations. Today the advent of computers, large databases and the internet is make easier way to collect millions, billions and even trillions of pieces of data that can be systematically analyzed to help look for relationships and to seek solutions to difficult problems.

The government, private company, large organization and all businesses are looking for large volume of information collection for research and business development. These all collected data can be stored by them to future use. Such kind of information is most important whenever it is require. It will take very much time for searching and find require information from the internet or any other resources.

Here is an overview of data mining services inclusion:

* Market research, product research, survey and analysis

* Collection information about investors, funds and investments

* Forums, blogs and other resources for customer views/opinions

* Scanning large volumes of data

* Information extraction

* Pre-processing of data from the data warehouse

* Meta data extraction

* Web data online mining services

* data online mining research

* Online newspaper and news sources information research

* Excel sheet presentation of data collected from online sources

* Competitor analysis

* data mining books

* Information interpretation

* Updating collected data

After applying the process of data mining, you can easily information extract from filtered information and processing the refining the information. This data process is mainly divided into 3 sections; pre-processing, mining and validation. In short, data online mining is a process of converting data into authentic information.

The most important is that it takes much time to find important information from the data. If you want to grow your business rapidly, you must take quick and accurate decisions to grab timely available opportunities.

Outsourcing Web Research is one of the best data mining outsourcing organizations having more than 17 years of experience in the market research industry. To know more information about our company please contact us.

Outsourcing Web Research is one of the best data mining outsourcing organizations having more than 17 years of experience in the market research industry.

Source: http://ezinearticles.com/?What-is-Data-Mining?-Why-Data-Mining-is-Important?&id=3613677

Sunday, 22 February 2015

CSR in the Extraction Sector

A study commissioned by the Canadian Mining industry found that Canadian mining companies were involved in 4 times as many mining "incidents" as companies from other countries. The study was intended for internal consumption only but has been leaked to the press recently. The study found that Canadian mining companies were involved in nearly two thirds of the 171 "high profile" environmental and human rights violations it studied occurring between 1999 and 2009. Members of the mining industry pointed out that the occurrences are in proportion to their representation on the global mining scene, indicating that they were no better or worse than companies from other countries.

First some background on the study. The study findings were captured in a report titled "Corporate Social Responsibility & the Canadian International Extractive Sector: A Survey". The report was prepared for the Prospectors and Developers Association of Canada (PDAC) by the Canadian Centre for the Study of Resource Conflict (CCSRC). The purpose of the study was to measure the level of Corporate Social Responsibility (CSR) in the "extractive" sector. The extractive sector, for those of us untutored in the terminology means exploration, gas, oil, and mining companies. The document leaked to the press was a first draft of the report, not the final draft. I should also mention that there is a bill, C-300, before the Canadian parliament which would make financing for foreign ventures contingent on meeting federally defined CSR standards. The exploration, gas, oil, and mining companies, and the organizations which represent them are very much against this bill. Leaking the negative aspects of this report was fortuitous for those in support of bill C-300 and disastrous for those opposed to it.

One of the observations the report makes is that adoption of formal CSR policies by companies with international interests is "remarkably low", but that those companies which have adopted CSR policies have experienced positive outcomes. The CCSRC contacted 584 companies which they felt met their criteria to participate in the study. Of those, 202 chose to participate. The first survey question was "Do you have a CSR policy or Code of International Business Conduct?" 56 of the 202 companies had documented policies in place. The study broke the 202 companies they surveyed into "junior" and "major" companies. 50% of the companies designated as major had documented CSR policies while only 21% of junior companies had one.

The survey also asked about the positive effects of a CSR policy. 24% of respondents claimed a reduction in conflicts or complications, 62% claimed better community relations (relations with the communities they were doing business in), and 25% reported increased shareholder interest. On the downside, 24% reported increased administration costs and 25% reported increased operating costs. One question they failed to ask was whether the benefits outweighed the costs.

The information I've stated in the preceding 2 paragraphs was gleaned from the final draft of the report. I don't have access to the first draft but apparently it described some of the 171 violations they were addressing in the study. I reported on one such violation in Project Management Tips section of this web site under the title "CSR Problems". The incidents reported on reflect the difficulty faced by companies who conduct business in some international locations. These incidents juxtapose our Canadian values and ethics with those of the countries our exploration, gas, mining, and oil companies do business in. One incident reported on, and attributed to the mining company's lack of CSR by the media, pitted one host community against another with the resulting violence blamed on the Canadian mining company. I'm not suggesting here that these companies have not made mistakes in the past, or that improvements cannot be made in their CSR efforts, I am suggesting that we should have realistic expectations about the effectiveness of a CSR policy to prevent any problems in a foreign venture.

A reasonable expectation in some cases would be that the company have a documented CSR policy which conforms to the standards and ethics of this country (Canada), abides by the laws of the host country, and conforms to the standards and ethics of the host country. The expectation should be tempered with the acknowledgment that the operating environment these companies encounter in host countries can be radically different than that found here. For example, when one community is in conflict with another over whether a mining operation should take place, we tend to look to non-violent forms of dispute resolution where some countries may resort to extreme violence to settle the dispute. Canadian companies frequently hire locals as security guards to protect their property as local authorities cannot perform this duty for one reason or another. It is reasonable to expect the hiring company to do its due diligence in hiring these people to ensure they don't create a threat to the surrounding community. It is not reasonable to expect that there will be no conflicts arising out of these situations. Where it is suspected that a security guard overstepped their authority, or engaged in illegal behaviour, it is reasonable to expect the employer to cooperate with the local authorities in the investigation.

North American companies doing business internationally have long had to deal with conflicts between acceptable corporate behaviour in their own country and acceptable behaviour in the host country. Bribery is the classic example. There are countries where bribery is not only accepted but essential to conducting business. Our laws will convict anyone proved to have offered a bribe but failure to pay the bribe may result in a failure to perform on the part of the North American company. Failure to perform might result in the loss of all or part of the company's investment in the project. Holding a company to this type of double standard can only result in one of 2 outcomes: the company will break the rule against bribery, or the company will cease to do business in that host country.

Since this web site is aimed at the project management community, let's draw some conclusions from the survey and CSR in general that may help project managers. The first conclusion I would draw from all of the above is that the CSR policy that governs your project must describe achievable goals. By this I mean that the goals, objectives, and standards stated in the policy must be within the project's power to achieve, or comply with. The second conclusion is that the right CSR policy carefully implemented can provide a business benefit to the organization. It is the project manager's job to ensure that those benefits are realized.

The goals and objectives of the project must include goals and objectives in support of the CSR policy. Those goals and objectives should be spelled out in the Project Charter and the connection between those goals and objectives and the CSR policy clearly defined. Make sure that the CSR related goals and objectives you set for the project are clearly defined, measurable, and obtainable and then agree with your stakeholders on the conditions that will indicate the goals have been met. Check for CSR policy goals and objectives that might conflict with each other and any of your project's goals and objectives, both CSR related and non-CSR. Goals and objectives you feel might conflict with each other, or with the CSR policy should be resolved by senior management. Start your escalation by drawing the project sponsor's attention to the conflict and ask for their help with resolution.

Source: http://ezinearticles.com/?CSR-in-the-Extraction-Sector&id=5675024

Thursday, 19 February 2015

The Equipment Used in Mining

Bureau of Labor statistics reported that there are five major segments in the mining industry. They are gas and oil extracting, coal mining, non-metal mineral mining, metal ore mining and the supporting activities. In this matter, each segment might need different equipment. But, there are some types of mining equipment that are used by all segments of the mining industry.

Excavators

Excavators are types of equipment that are used by the miners to break and remove soil today. Traditionally, they used steam shovels and shovels to do the jobs. An excavator is a vehicle that moves with standard wheels or moves on tracks. There is a rotating platform and a bucket to its end for digging the soil.

Draglines

Draglines are very big earth moving machines that are used in mining industry. These machines are used to expose the underlying mineral deposits. These are also used to drag away the dirt. The Kentucky Coal Education said that draglines are one of the largest machine in the world. These can remove several hundred tons of the material in one pass.

Drills

Drills are very important for miners that extract natural gas and oil. Miners use these machines to reach underground deposits before they pipe the resources to the surface. Instead of being used in gas and oil mining, these machines are also used to mine coal and mineral.

Roof bolters

These machines are used to prevent underground collapses when the mining process is in progress. These are also used to support the tunnel roofs in mining location.

Continuous miners and longwall miners

These machines are usually used by subterranean coal miners. These machines are used to scrape coal from the coal beds. Meanwhile, the longwall miners are machines that are used to remove large, rectangular sections of coal instead of scraping coal from a bed.

Rock duster

These are pressurized pieces of equipment that are used in coal mining to spray inert mineral dust over the highly flammable coal dust. This inert dust will help prevent accidental explosions and fires.

Source:http://ezinearticles.com/?The-Equipment-Used-in-Mining&id=5633103

Wednesday, 18 February 2015

Coal Seam Gas - Extraction and Processing

With rapidly depleting natural resources, people around the globe are looking for new sources of energy. Lots of people don't think much of it, but doing this is an excellent ecological move forward and may even be a lucrative endeavour. Australia has one the most significant deposits of a recently discovered gas known as coal seam gas. The deposit present in areas such as New South Wales is far more significant than the others since it contains little methane and much more carbon dioxide.

What is coal seam gas?

Coal bed methane is the more general term for this substance. It is a form of natural gas taken from substantial coal beds. The existence of this material usually spelled hazard for many sites. This stopped in recent decades, when specialists discovered its potential as an energy source. It's now among the most important sources of energy in a number of countries, particularly in North America. Extraction within australia is actually rapidly developing because of rich deposits in various parts of the country.

Extraction

The extraction procedure is reasonably challenging. It calls for heavy drilling, water pumping, and tubing. Though there are a variety of different processes, pipeline construction(an initial step) is perhaps one of the most important. The foundation of the course of action can spell a real difference between the failure or success of your undertaking.

Working with a Contractor

Pipeline construction and design is serious business. Seasoned contractors may be hard to get considering the fact that Australia's coal seam gas industry is still fairly young. You'll find only a limited number of completed and working projects across the country. There are several things to consider when getting a contractor for the project.

Find one with substantial experience within the industry sector. Some service providers have operations outside the country, especially in Canada And America. This is something you should look out for, as advancement of the gas originated there. Providers with completed projects in the said area can have the solutions required for any project to take off.

The construction process involves several basic steps. It is important the service provider you work with addresses all of your needs. Below are a few of the important supplementary services to look for.

- Pipeline design, production, and installation

- Custom ploughing (to achieve specialized trenching requirements)

- Protection and repair of pipelines with the use of various liners

- Pressure assessment and commissioning

These are only the fundamentals of pipeline construction. Sourcing coal seam gas involves many others. Do thorough research to ensure the service provider you employ is capable of completing all the necessary tasks. Other elements of the undertaking include engineering plus site preparation and rehabilitation. This industrial sector may be profitable if one makes all of the proper moves.

Avoid making uninformed decisions by doing as much research as you possibly can. Use the web to your advantage to look into a company's profile. Look for a portfolio of the projects they have completed in the past. You can gauge their trustworthiness based on their volume of clients. Check out the scope of their operations and the projects they finished.

You should also think about company policies concerning the quality of their work, safety and health, along with their policies concerning communities and the environment. These are seemingly minute but important details when searching for a contractor for pipeline construction projects.

Source: http://ezinearticles.com/?Coal-Seam-Gas---Extraction-and-Processing&id=6954936

Friday, 13 February 2015

Dear Donna: Tread Lightly When Suggesting 'Man-Scaping'

Dear Donna,

A man I recently began dating needs some "man-scaping." I would find him much more appealing if he trimmed the hair in his ears and nose and on the back of his neck. He has hinted that he is buying me something for Valentine's Day. Do you think it would be appropriate to buy him a gift certificate to a spa and tell him he could benefit from some "man-scaping?" - Anonymous

Dear Anonymous,

Since Valentine's Day is about love and romance, I would not put the focus on "man-scaping" by buying him a gift certificate to a spa. There is no easy, tactful way to suggest to someone that they do something about ear and nose hair. One day when you are sitting close to him, whisper in his ear, "I think you could use some "man-scaping." After you explain what "man-scaping" is, be ready with the card to the spa.

Dear Donna,

I am in my 40s, single and dating for the past five years. I meet men mostly through friends, work and online. The last two men I met assumed we would split the bill after lunch or dinner. The first one caught me off guard so I paid. I also immediately decided I would not see him again. The second added up my half of the bill and asked me what kind of tip I thought was appropriate. I told him I thought he should pay, and the date went downhill from there. Whatever happened to the gentleman pays? - Sarah

Dear Sarah,

This is a side effect of online dating. When you are meeting multiple women, it can be expensive to always be the one paying. If you are meeting someone for the first time, keep it simple. Agree to meet for one hour and not over lunch or dinner. Most men do not expect a lady to split the cost of a cup of coffee or a glass of wine. Bottom line, if he is interested in you, he gladly will pay.

Source:http://gazette.com/dear-donna-tread-lightly-when-suggesting-man-scaping/article/1545611

Monday, 9 February 2015

I Don’t Need No Stinking API: Web Scraping For Fun and Profit

If you’ve ever needed to pull data from a third party website, chances are you started by checking to see if they had an official API. But did you know that there’s a source of structured data that virtually every website on the internet supports automatically, by default?
scraper toolThat’s right, we’re talking about pulling our data straight out of HTML — otherwise known as web scraping. Here’s why web scraping is awesome:

Any content that can be viewed on a webpage can be scraped. Period.

If a website provides a way for a visitor’s browser to download content and render that content in a structured way, then almost by definition, that content can be accessed programmatically. In this article, I’ll show you how.

Over the past few years, I’ve scraped dozens of websites — from music blogs and fashion retailers to the USPTO and undocumented JSON endpoints I found by inspecting network traffic in my browser.

There are some tricks that site owners will use to thwart this type of access — which we’ll dive into later — but they almost all have simple work-arounds.

Why You Should Scrape

But first we’ll start with some great reasons why you should consider web scraping first, before you start looking for APIs or RSS feeds or other, more traditional forms of structured data.

Websites are More Important Than APIs

The biggest one is that site owners generally care way more about maintaining their public-facing visitor website than they do about their structured data feeds.

We’ve seen it very publicly with Twitter clamping down on their developer ecosystem, and I’ve seen it multiple times in my projects where APIs change or feeds move without warning.

Sometimes it’s deliberate, but most of the time these sorts of problems happen because no one at the organization really cares or maintains the structured data. If it goes offline or gets horribly mangled, no one really notices.

Whereas if the website goes down or is having issues, that’s a more of an in-your-face, drop-everything-until-this-is-fixed kind of problem, and gets dealt with quickly.

No Rate-Limiting

Another thing to think about is that the concept of rate-limiting is virtually non-existent for public websites.

Aside from the occasional captchas on sign up pages, most businesses generally don’t build a lot of defenses against automated access. I’ve scraped a single site for over 4 hours at a time and not seen any issues.

Unless you’re making concurrent requests, you probably won’t be viewed as a DDOS attack, you’ll just show up as a super-avid visitor in the logs, in case anyone’s looking.

Anonymous Access

There are also fewer ways for the website’s administrators to track your behavior, which can be useful if you want gather data more privately.

With APIs, you often have to register to get a key and then send along that key with every request. But with simple HTTP requests, you’re basically anonymous besides your IP address and cookies, which can be easily spoofed.

The Data’s Already in Your Face

Web scraping is also universally available, as I mentioned earlier. You don’t have to wait for a site to open up an API or even contact anyone at the organization. Just spend some time browsing the site until you find the data you need and figure out some basic access patterns — which we’ll talk about next.

Let’s Get to Scraping

So you’ve decided you want to dive in and start grabbing data like a true hacker. Awesome.

Just like reading API docs, it takes a bit of work up front to figure out how the data is structured and how you can access it. Unlike APIs however, there’s really no documentation so you have to be a little clever about it.

I’ll share some of the tips I’ve learned along the way.

Fetching the Data

So the first thing you’re going to need to do is fetch the data. You’ll need to start by finding your “endpoints” — the URL or URLs that return the data you need.

If you know you need your information organized in a certain way — or only need a specific subset of it — you can browse through the site using their navigation. Pay attention to the URLs and how they change as you click between sections and drill down into sub-sections.

The other option for getting started is to go straight to the site’s search functionality. Try typing in a few different terms and again, pay attention to the URL and how it changes depending on what you search for. You’ll probably see a GET parameter like q= that always changes based on you search term.

Try removing other unnecessary GET parameters from the URL, until you’re left with only the ones you need to load your data. Make sure that there’s always a beginning ? to start the query string and a & between each key/value pair.

Dealing with Pagination

At this point, you should be starting to see the data you want access to, but there’s usually some sort of pagination issue keeping you from seeing all of it at once. Most regular APIs do this as well, to keep single requests from slamming the database.

Usually, clicking to page 2 adds some sort of offset= parameter to the URL, which is usually either the page number or else the number of items displayed on the page. Try changing this to some really high number and see what response you get when you “fall off the end” of the data.

With this information, you can now iterate over every page of results, incrementing the offset parameter as necessary, until you hit that “end of data” condition.

The other thing you can try doing is changing the “Display X Per Page” which most pagination UIs now have. Again, look for a new GET parameter to be appended to the URL which indicates how many items are on the page.

Try setting this to some arbitrarily large number to see if the server will return all the information you need in a single request. Sometimes there’ll be some limits enforced server-side that you can’t get around by tampering with this, but it’s still worth a shot since it can cut down on the number of pages you must paginate through to get all the data you need.

AJAX Isn’t That Bad!

Sometimes people see web pages with URL fragments # and AJAX content loading and think a site can’t be scraped. On the contrary! If a site is using AJAX to load the data, that probably makes it even easier to pull the information you need.

The AJAX response is probably coming back in some nicely-structured way (probably JSON!) in order to be rendered on the page with Javscript.

All you have to do is pull up the network tab in Web Inspector or Firebug and look through the XHR requests for the ones that seem to be pulling in your data.

Once you find it, you can leave the crufty HTML behind and focus instead on this endpoint, which is essentially an undocumented API.

(Un)structured Data?

Now that you’ve figured out how to get the data you need from the server, the somewhat tricky part is getting the data you need out of the page’s markup.

Use CSS Hooks

In my experience, this is usually straightforward since most web designers litter the markup with tons of classes and ids to provide hooks for their CSS.

You can piggyback on these to jump to the parts of the markup that contain the data you need.

Just right click on a section of information you need and pull up the Web Inspector or Firebug to look at it. Zoom up and down through the DOM tree until you find the outermost <div> around the item you want.

This <div> should be the outer wrapper around a single item you want access to. It probably has some class attribute which you can use to easily pull out all of the other wrapper elements on the page. You can then iterate over these just as you would iterate over the items returned by an API response.

A note here though: the DOM tree that is presented by the inspector isn’t always the same as the DOM tree represented by the HTML sent back by the website. It’s possible that the DOM you see in the inspector has been modified by Javascript — or sometime even the browser, if it’s in quirks mode.

Once you find the right node in the DOM tree, you should always view the source of the page (“right click” > “View Source”) to make sure the elements you need are actually showing up in the raw HTML.

This issue has caused me a number of head-scratchers.

Get a Good HTML Parsing Library

It is probably a horrible idea to try parsing the HTML of the page as a long string (although there are times I’ve needed to fall back on that). Spend some time doing research for a good HTML parsing library in your language of choice.

Most of the code I write is in Python, and I love BeautifulSoup for its error handling and super-simple API. I also love its motto:

You didn’t write that awful page. You’re just trying to get some data out of it. Beautiful Soup is here to help. :)

You’re going to have a bad time if you try to use an XML parser since most websites out there don’t actually validate as properly formed XML (sorry XHTML!) and will give you a ton of errors.

A good library will read in the HTML that you pull in using some HTTP library (hat tip to the Requests library if you’re writing Python) and turn it into an object that you can traverse and iterate over to your heart’s content, similar to a JSON object.

Some Traps To Know About

I should mention that some websites explicitly prohibit the use of automated scraping, so it’s a good idea to read your target site’s Terms of Use to see if you’re going to make anyone upset by scraping.

For two-thirds of the website I’ve scraped, the above steps are all you need. Just fire off a request to your “endpoint” and parse the returned data.

But sometimes, you’ll find that the response you get when scraping isn’t what you saw when you visited the site yourself.

When In Doubt, Spoof Headers

Some websites require that your User Agent string is set to something they allow, or you need to set certain cookies or other headers in order to get a proper response.

Depending on the HTTP library you’re using to make requests, this is usually pretty straightforward. I just browse the site in my web browser and then grab all of the headers that my browser is automatically sending. Then I put those in a dictionary and send them along with my request.

Note that this might mean grabbing some login or other session cookie, which might identify you and make your scraping less anonymous. It’s up to you how serious of a risk that is.

Content Behind A Login

Sometimes you might need to create an account and login to access the information you need. If you have a good HTTP library that handles logins and automatically sending session cookies (did I mention how awesome Requests is?), then you just need your scraper login before it gets to work.

Note that this obviously makes you totally non-anonymous to the third party website so all of your scraping behavior is probably pretty easy to trace back to you if anyone on their side cared to look.

Rate Limiting

I’ve never actually run into this issue myself, although I did have to plan for it one time. I was using a web service that had a strict rate limit that I knew I’d exceed fairly quickly.

Since the third party service conducted rate-limiting based on IP address (stated in their docs), my solution was to put the code that hit their service into some client-side Javascript, and then send the results back to my server from each of the clients.

This way, the requests would appear to come from thousands of different places, since each client would presumably have their own unique IP address, and none of them would individually be going over the rate limit.

Depending on your application, this could work for you.

Poorly Formed Markup

Sadly, this is the one condition that there really is no cure for. If the markup doesn’t come close to validating, then the site is not only keeping you out, but also serving a degraded browsing experience to all of their visitors.

It’s worth digging into your HTML parsing library to see if there’s any setting for error tolerance. Sometimes this can help.

If not, you can always try falling back on treating the entire HTML document as a long string and do all of your parsing as string splitting or — God forbid — a giant regex.

—

Well there’s 2000 words to get you started on web scraping. Hopefully I’ve convinced you that it’s actually a legitimate way of collecting data.

It’s a real hacker challenge to read through some HTML soup and look for patterns and structure in the markup in order to pull out the data you need. It usually doesn’t take much longer than reading some API docs and getting up to speed with a client. Plus it’s way more fun!

Source: https://blog.hartleybrody.com/web-scraping/