Class CrawlerSessionManagerValve

All Implemented Interfaces:
MBeanRegistration, Contained, JmxEnabled, Lifecycle, Valve

public class CrawlerSessionManagerValve extends ValveBase
Web crawlers can trigger the creation of many thousands of sessions as they crawl a site which may result in significant memory consumption. This Valve ensures that crawlers are associated with a single session - just like normal users - regardless of whether or not they provide a session token with their requests.
  • Constructor Details

    • CrawlerSessionManagerValve

      public CrawlerSessionManagerValve()
      Specifies a default constructor so async support can be configured.
  • Method Details

    • setCrawlerUserAgents

      public void setCrawlerUserAgents(String crawlerUserAgents)
      Specify the regular expression (using Pattern) that will be used to identify crawlers based in the User-Agent header provided. The default is ".*GoogleBot.*|.*bingbot.*|.*Yahoo! Slurp.*"
      Parameters:
      crawlerUserAgents - The regular expression using Pattern
    • getCrawlerUserAgents

      public String getCrawlerUserAgents()
      Get the regular expression used to identify crawlers based on the User-Agent header.
      Returns:
      The current regular expression being used to match user agents
      See Also:
    • setCrawlerIps

      public void setCrawlerIps(String crawlerIps)
      Specify the regular expression (using Pattern) that will be used to identify crawlers based on their IP address. The default is no crawler IPs.
      Parameters:
      crawlerIps - The regular expression using Pattern
    • getCrawlerIps

      public String getCrawlerIps()
      Get the regular expression used to identify crawlers based on their IP address.
      Returns:
      The current regular expression being used to match IP addresses
      See Also:
    • setSessionInactiveInterval

      public void setSessionInactiveInterval(int sessionInactiveInterval)
      Specify the session timeout (in seconds) for a crawler's session. This is typically lower than that for a user session. The default is 60 seconds.
      Parameters:
      sessionInactiveInterval - The new timeout for crawler sessions
    • getSessionInactiveInterval

      public int getSessionInactiveInterval()
      Get the session timeout for a crawler's session.
      Returns:
      The current timeout in seconds
      See Also:
    • getClientIpSessionId

      public Map<String,String> getClientIpSessionId()
      Get the map of client identifiers to session IDs.
      Returns:
      The map of client identifiers to session IDs
    • isHostAware

      public boolean isHostAware()
      Determine whether the client identifier includes the host name.
      Returns:
      true if the client identifier includes the host name
    • setHostAware

      public void setHostAware(boolean isHostAware)
      Set whether the client identifier should include the host name.
      Parameters:
      isHostAware - true if the client identifier should include the host name
    • isContextAware

      public boolean isContextAware()
      Determine whether the client identifier includes the context name.
      Returns:
      true if the client identifier includes the context name
    • setContextAware

      public void setContextAware(boolean isContextAware)
      Set whether the client identifier should include the context name.
      Parameters:
      isContextAware - true if the client identifier should include the context name
    • setClientIdentifierFunction

      public void setClientIdentifierFunction(Function<Request,String> clientIdentifierFunction)
      Specify the clientIdentifier function that will be used to identify unique clients. The default is to use the client IP address, optionally combined with the host name and context name.
      Parameters:
      clientIdentifierFunction - The new function used to build identifiers for clients.
    • initInternal

      protected void initInternal() throws LifecycleException
      Description copied from class: LifecycleBase
      Subclasses implement this method to perform any instance initialisation required.
      Overrides:
      initInternal in class ValveBase
      Throws:
      LifecycleException - If the initialisation fails
    • invoke

      public void invoke(Request request, Response response) throws IOException, ServletException
      Description copied from interface: Valve

      Perform request processing as required by this Valve.

      An individual Valve MAY perform the following actions, in the specified order:

      • Examine and/or modify the properties of the specified Request and Response.
      • Examine the properties of the specified Request, completely generate the corresponding Response, and return control to the caller.
      • Examine the properties of the specified Request and Response, wrap either or both of these objects to supplement their functionality, and pass them on.
      • If the corresponding Response was not generated (and control was not returned), call the next Valve in the pipeline (if there is one) by executing getNext().invoke().
      • Examine, but not modify, the properties of the resulting Response (which was created by a subsequently invoked Valve or Container).

      A Valve MUST NOT do any of the following things:

      • Change request properties that have already been used to direct the flow of processing control for this request (for instance, trying to change the virtual host to which a Request should be sent from a pipeline attached to a Host or Context in the standard implementation).
      • Create a completed Response AND pass this Request and Response on to the next Valve in the pipeline.
      • Consume bytes from the input stream associated with the Request, unless it is completely generating the response, or wrapping the request before passing it on.
      • Modify the HTTP headers included with the Response after the getNext().invoke() method has returned.
      • Perform any actions on the output stream associated with the specified Response after the getNext().invoke() method has returned.
      Parameters:
      request - The servlet request to be processed
      response - The servlet response to be created
      Throws:
      IOException - if an input/output error occurs, or is thrown by a subsequently invoked Valve, Filter, or Servlet
      ServletException - if a servlet error occurs, or is thrown by a subsequently invoked Valve, Filter, or Servlet