Data Compliance & Ethical Scraping

How PullAPI approaches responsible data collection -- our principles, our practices, and our commitment to doing this the right way.

1. Our Commitment

PullAPI exists because we believe publicly available web data should be accessible for innovation, research, and fair competition. Search engines, price comparison services, academic institutions, and businesses of every size depend on public web data to build products, inform decisions, and create value. We provide the infrastructure that makes this access reliable and structured.

But access comes with responsibility. We are committed to collecting data ethically, respecting the ecosystems we operate in, and giving our customers the tools and transparency they need to use data lawfully. This is not just a legal obligation -- it is foundational to how we build and run PullAPI.

This page describes the principles and practices that guide our work. It is not a binding legal document (our Terms of Service and Privacy Policy serve that role), but a transparent statement of how we operate and why.

2. What We Collect

PullAPI collects only publicly accessible data -- the same information visible to any person visiting a website with a standard web browser. If you can see it by typing a URL into Chrome, that is the kind of data we work with.

We do not:

Access data behind logins, paywalls, or any form of authentication
Scrape private messages, non-public profiles, or confidential information
Harvest email addresses, passwords, or credentials from any source
Access data that requires accepting restrictive click-through agreements

Our scrapers behave like regular web browsers. They use the same HTTP protocols, request the same public pages, and read the same HTML that any visitor would see. The difference is that we extract the structured data from those pages and return it as clean JSON through our API -- saving our customers the engineering effort of doing it themselves.

3. Legal Framework

The legal landscape around web scraping of public data has matured significantly. Courts, regulators, and industry practice have established a clear foundation for what we do.

Key Legal Precedents

hiQ Labs v. LinkedIn (2022): The U.S. Supreme Court let stand the Ninth Circuit's ruling that accessing publicly available data does not violate the Computer Fraud and Abuse Act (CFAA). The court recognized that public data on the open web is fundamentally different from private, access-controlled systems. This landmark case affirmed that scraping publicly available information is a legitimate practice.
EU Data Act (2024): The European Union's Data Act promotes access to and use of data, including publicly available data, to foster innovation, competition, and the development of new services. The regulation recognizes the value of data accessibility for the broader economy.
Established Industry Practice: Web scraping of public data is practiced globally by search engines indexing the web, price comparison platforms aggregating listings, financial firms monitoring markets, academic researchers studying public discourse, and businesses tracking competitive landscapes. The practice is a recognized and essential part of how the internet functions.

Our Position

PullAPI accesses only publicly available data using standard web protocols. We do not bypass technical access controls that constitute unauthorized access under applicable law. We do not circumvent authentication systems, crack passwords, or exploit security vulnerabilities. When a website communicates access restrictions through technical measures, we respect those boundaries.

4. Responsible Scraping Practices

Being legally permitted to scrape public data does not mean doing so without care. We invest heavily in systems that minimize our impact on the websites we access.

Rate Limiting & Throttling

Every scraper in our system operates with request throttling. We limit the frequency of requests to any single target to avoid placing excessive load on their servers. Our systems are designed to be a polite visitor, not a burden.

Intelligent Caching

API responses are cached for 15 minutes to 1 hour depending on the data type. When multiple customers request the same data within the cache window, we serve it from cache rather than making another request to the source. This dramatically reduces the total number of requests we make to any given website -- often by 60-80% or more during peak usage.

Minimal Data Footprint

We extract only the structured data fields our customers need -- not entire pages, not all assets, not images or scripts. A typical request retrieves a fraction of the data that a normal browser visit would download, consuming far less bandwidth from the target server.

Graceful Error Handling

When a target server returns errors or signals overload (via HTTP 429, 503, or similar status codes), our systems respect those signals. We implement exponential backoff and reduce request frequency automatically. We do not hammer servers that are indicating distress.

5. Privacy by Design

Privacy is not an afterthought at PullAPI -- it is built into our architecture from the ground up.

EU-Hosted Infrastructure

Our servers run in Helsinki, Finland, hosted by Hetzner -- a German infrastructure provider subject to EU data protection law. This means the data we process benefits from the strong privacy framework of the European Union, including GDPR protections, regardless of where our customers are located.

Data Minimization & Auto-Expiry

We cache only the structured data needed to serve API responses, and every cache entry has a strict time-to-live (TTL). After the TTL expires, the data is automatically purged. We do not maintain long-term archives of scraped content. Request logs are retained for 90 days for debugging purposes and then deleted.

No Cross-Source Profiling

We do not aggregate personal data from different source websites to build individual profiles. If our real estate API returns a listing agent's public contact information and our business API returns a company profile, we do not link those records together. Each data source is treated independently.

Customer Responsibility

Our Acceptable Use Policy requires customers to have a lawful basis for processing any personal data contained in API responses. Customers are data controllers under GDPR and bear responsibility for how they use the data they receive. We provide the access; they are accountable for their use of it.

6. Data Subject Requests

If your personal information appears in data returned by our API -- for example, as a publicly listed real estate agent, a business owner, or a public profile -- you have rights under GDPR, CCPA, and other data protection regulations.

You can contact us at privacy@pullapi.com to:

Request information about what data related to you may appear in our API responses
Request exclusion of specific data from our cache and future API responses
Exercise your right to erasure under GDPR Article 17 or CCPA deletion rights
Object to processing of your personal data through our services

We respond to all data subject requests within 30 days. In many cases, we can act faster. When we receive a valid request, we will exclude the relevant data from our cache and configure our systems to omit it from future responses.

For a full description of your privacy rights, please see our Privacy Policy.

7. Website Owner Requests

We recognize that website owners invest significant effort in creating and maintaining their platforms. If you operate a website and have concerns about PullAPI accessing your publicly available data, we want to hear from you.

Review your request in good faith -- we take every inquiry seriously and respond promptly, typically within 5 business days
Discuss your specific concerns -- whether related to request volume, data types, or general access patterns
Respect clearly communicated preferences regarding automated access to your platform
Adjust our practices where appropriate, including modifying request rates, excluding specific data, or other reasonable accommodations

We believe that open communication leads to better outcomes for everyone. Many of the websites we access also benefit from the ecosystem of data accessibility -- their listings gain wider exposure, their businesses get more visibility, and their users find information more easily. We are always open to finding arrangements that work for all parties.

8. Continuous Improvement

The web evolves, regulations develop, and best practices advance. Our approach to ethical data collection is not static -- it improves continuously.

We regularly:

Review our scraping practices against current legal developments and industry standards
Update our policies to reflect new regulations and guidance from data protection authorities
Invest in technology that reduces our footprint -- better caching, smarter rate limiting, and more efficient data extraction
Listen to feedback from data subjects, website owners, customers, and the broader community

If you have questions, suggestions, or concerns about any of our data practices, we welcome your input. Reach out to us at legal@pullapi.com or privacy@pullapi.com.